Ophthalmology Science — Latest Matching Preprints

1

Uncertainty-Gated Glaucoma Screening: Combining Semi-Supervised Classification with Multi-Agent Large Language Model Deliberation

Garimella Narasimha, S. V.; Brown, N.; Sridhar, S.

2026-04-20 ophthalmology 10.64898/2026.04.17.26351127 medRxiv

Top 0.1%

17.3%

Show abstract

Automated glaucoma screening from optical coherence tomography (OCT) faces two persistent challenges: scarcity of expert-labeled data and unreliable model predictions on diagnostically ambiguous cases. We present a two-tier diagnostic pipeline that addresses both. In the first tier, an EfficientNetV2-S classifier trained under a semi-supervised pseudo supervisor framework achieves 0.84 AUC on 150 held-out test patients from the Harvard Glaucoma Detection and Progression dataset, using only 350 labeled training samples out of 700. In the second tier, 124 flagged cases are routed to a multi-agent system built on MedGemma 4B, where three specialist agents deliberate over three rounds before rendering a final diagnosis. On these flagged cases, the agent system achieves 100% sensitivity--detecting all 55 glaucoma cases with zero missed diagnoses--and 89.5% overall accuracy (111/124), compared to the classifiers 73.4% (91/124). Uncertainty analysis confirms that the classifiers output probability reliably separates confident predictions (96.3% accuracy, n = 27) from uncertain ones (74.0%, n = 123), producing a 22-percentage-point gap that serves as a triage signal. The agents fix 32 cases the classifier misclassifies while introducing 12 new errors, yielding a net improvement of 20 cases. These results are from a single training run without variance estimates and should be interpreted as preliminary evidence that uncertainty-gated routing to vision-language model agents can meaningfully improve diagnostic accuracy on the cases where automated classifiers are least reliable.

2

Comparison of foundation models and transfer learning strategies for diabetic retinopathy classification

Li, L. Y.; Lebiecka-Johansen, B.; Byberg, S.; Thambawita, V.; Hulman, A.

2026-04-20 health informatics 10.64898/2026.04.17.26351092 medRxiv

Top 0.1%

3.7%

Show abstract

Diabetic retinopathy (DR) is a leading cause of vision impairment, requiring accurate and scalable diagnostic tools. Foundation models are increasingly applied to clinical imaging, but concerns remain about their calibration. We evaluated DINOv3, RETFound, and VisionFM for DR classification using different transfer learning strategies in BRSET (n = 16,266) and mBRSET (n = 5,164). Models achieved high discrimination in binary classification (normal vs retinopathy) in BRSET (AUROC 0.90-0.98), with DINOv3 achieving the best under full fine-tuning (AUROC 0.98 [95% CI: 0.97-0.99]). External validation on mBRSET showed decreased performance for all models regardless of the fine-tuning strategy (AUROC 0.70-0.85), though fine-tuning improved performance. Foundation models achieved strong discrimination but poor calibration, generally overestimating DR risk. While the generalist model, DINOv3, benefited from deeper fine-tuning, miscalibration remained evident. These findings underscore the need to improve calibration and the comprehensive evaluation of foundation models, which are essential in clinical settings. Author summaryArtificial intelligence is increasingly being used to detect eye diseases such as diabetic retinopathy from retinal images. Recent advances have introduced "foundation models," which are trained on large datasets and can be adapted to new tasks. We aimed to evaluate how well these models perform in a clinical prediction context, with a focus not only on accuracy but also on how reliably they estimate disease risk. In this study, we compared different types of foundation models using two independent datasets from Brazil. We found that while these models were generally good at distinguishing between healthy and diseased eyes, their predicted risks were often poorly calibrated. In other words, the estimated probabilities did not consistently reflect the true likelihood of disease. We also examined whether adapting the models to the target population could improve performance. Although this approach led to improvements, calibration issues remained. However, post-training correction improved the agreement between predicted risks and observed outcomes. Our findings highlight an important gap between model performance and clinical usefulness. We suggest that improving the reliability of risk estimates is essential before such systems can be safely used in healthcare.

3

Multimodal prediction of visual improvement in diabetic macular edema using real-world electronic health records and optical coherence tomography images

Sun, S.; Cai, C. X.; Fan, R.; You, S.; Tran, D.; Rao, P. K.; Suchard, M. A.; Wang, Y.; Lee, C. S.; Lee, A. Y.; Zhang, L.

2026-04-24 health informatics 10.64898/2026.04.23.26351616 medRxiv

Top 0.2%

1.8%

Show abstract

Multimodal learning has the potential to improve clinical prediction by integrating complementary data sources, but the incremental value of imaging beyond structured electronic health record (EHR) data remains unclear in real-world settings. We developed a multimodal survival modeling framework integrating optical coherence tomography (OCT) and EHR data to predict time to visual improvement in patients with diabetic macular edema (DME), and evaluated how different ophthalmic foundation model representations contribute to prognostic performance. In a retrospective cohort of 973 patients (1,450 eyes) receiving anti-vascular endothelial growth factor therapy, we compared multimodal models combining 22,227 EHR variables with 196,402 OCT images, with OCT embeddings derived from three ophthalmic foundation models (RETFound, EyeCLIP, and VisionFM). The EHR-only model showed minimal prognostic discrimination (C-index 0.50 [95% CI, 0.45-0.55]). Incorporating OCT improved performance, with the magnitude of improvement depending on the representation. EHR+RETFound achieved the strongest performance (C-index 0.59 [0.54-0.65]), followed by EHR+EyeCLIP (0.57 [0.52-0.62]) and EHR+VisionFM (0.56 [0.51-0.61]). Multimodal models, particularly EHR+RETFound, demonstrated improved risk stratification with clearer separation of Kaplan-Meier curves. Partial information decomposition revealed that prognostic information was dominated by modality-specific contributions, with OCT and EHR providing largely distinct signals and minimal shared information. The magnitude of OCT-specific contribution varied across foundation models and aligned with observed performance differences. These findings indicate that OCT provides complementary prognostic value beyond structured clinical data, but gains are modest and depend strongly on representation choice. Our results highlight both the promise of multimodal modeling for personalized prognosis and the need for rigorous, context-specific evaluation of foundation models in real-world clinical settings.

4

Proposed Classification System for the 445 nm Blue Light Laser for Treatment of Laryngeal Lesions

Khan, M.; Islam, A. M.; Abdel-Aty, Y.; Rosow, D.; Mallur, P.; Johns, M.; Rosen, C. A.; Bensoussan, Y. E.

2026-04-22 otolaryngology 10.64898/2026.04.20.26351290 medRxiv

Top 0.2%

1.7%

Show abstract

ObjectiveOnly preliminary investigations on the use of the 445 nanometer wavelength blue light laser (BLL) for various laryngeal pathologies have been described. Currently, no standard exists for reporting treatment technique and tissue effect with this modality. Here, we aim to establish and validate a classification system to describe laser-induced tissue effects. Study DesignRetrospective video-based study for classification development and reliability validation. MethodsVideo recordings from procedures performed with the BLL by multiple academic laryngologists were retrospectively reviewed. A preliminary 6-point classification (BLL 1-6) was developed based on expert consensus. Thirteen additional procedural clips were independently rated utilizing the classification schema to assess perceived tissue effect, and measure inter- and intra-rate reliability. ResultsThe final 5-point classification system (BLL 1-5) included angiolysis, blanching, tissue vaporization, ablation with mechanical tissue removal, and cutting. The consensus of the combined reviewers in rating all cases was 89% (58 of 65). Complete consensus was not achieved in 11% (7/65) of cases. Of those incorrect, 57% (4/7) were of clips illustrating the BLL-2 classification. Intra-rater reliability amongst the reviewers was 100%. ConclusionTissue effect of the 445 nm blue light laser can reliably be standardized with this proposed classification system. This rating system can be used to facilitate future systematic study of outcomes and effective communication between laryngologists and trainees.

5

Interpretable AI for Accelerated Video-Based Surgical Skill Assessment: A Highlights-Reel Approach

Lafouti, M.; Feldman, L. S.; Hooshiar, A.

2026-04-20 medical education 10.64898/2026.04.18.26351193 medRxiv

Top 0.2%

1.0%

Show abstract

BackgroundManual video-based evaluation of surgical skills can be time-consuming and delays trainee feedback. Artificial intelligence (AI) offers opportunities to automate aspects of assessment while maintaining clinician oversight. We developed an interpretable spatiotemporal model that classifies surgical expertise directly from endoscopic video in standardized training tasks and generates saliency-based "highlights reels" showing the most influential frames. MethodsAn RGB pipeline combining InceptionV3 for spatial feature extraction and a gated recurrent unit (GRU) for temporal modeling was trained on the JIGSAWS dataset. The model outputs novice, intermediate, or expert labels. A rolling-window, low-latency evaluation at 30 fps with a stride of 10 frames was used. A motion-augmented variant fused RGB with optical-flow features. Spatial and temporal saliency maps highlighted key decision-making regions. ResultsThe RGB model achieved 95% accuracy (F1: 92% expert, 86% intermediate, 99% novice). Performance was strongest for novice and expert trials, while intermediate trials showed the lowest recall, consistent with greater ambiguity around the intermediate skill level. Saliency maps consistently emphasized tool-tissue interactions and peaked during technically demanding phases. The optical-flow variant underperformed, approximately 38% accuracy, which may reflect sensitivity to global camera motion and other non-informative motion patterns. ConclusionsThis interpretable AI pipeline accurately classifies surgical skill while producing intuitive visual highlights. Future work will refine highlight thresholds and validate on laparoscopic inguinal hernia repair for realworld deployment.

6

Patient preferences for portable versus table-mounted visual field devices in rural Alabama: a mixed methods study within a telemedicine setting

Antwi-Adjei, E. K.; Datta, S.; Girkin, C. A.; Owsley, C.; Rhodes, L. A.; Fifolt, M.; Racette, L.

2026-04-25 ophthalmology 10.64898/2026.04.23.26351565 medRxiv

Top 0.3%

0.8%

Show abstract

Purpose To evaluate patient satisfaction and preferences for portable versus table-mounted visual field (VF) devices in a rural telemedicine setting and identify influencing factors. Methods We conducted a sequential explanatory mixed methods study at three Federally Qualified Health Centers (FQHCs) within the Alabama Screening and Intervention for Glaucoma and eye Health through Telemedicine (AL-SIGHT) study. Participants completed VF testing with table-mounted Humphrey Field Analyzer (HFA), tablet-based Melbourne Rapid Fields (MRF), and virtual reality (VR)-based VisuALL perimeters. Participants rated satisfaction, comfort, ease of use, and future testing preference. Chi-square tests assessed differences in device preferences. Twelve participants completed semi-structured interviews to explore reasons underlying preferences. Qualitative data were analyzed in NVivo 14 using reflexive thematic analysis. Results Among 271 respondents (mean age 60.4 years; 62.4% women), 50.6% preferred VR-based, 35.1% tablet-based, and 14.4% table-mounted for future testing ({chi}2 (2) = 53.52, p<0.001, Cramers V = 0.31). Satisfaction was highest for VR-based (56.9% very satisfied), followed by tablet-based (49.4%), and HFA (38.0%). VR-based perimeter was most frequently selected as the most comfortable (55.7%; {chi}2 (2) = 63.33, p<0.001, V = 0.34) and easiest to use (54.6%; {chi}2 (2) = 71.96, p<0.001, V = 0.36). Preferences did not vary significantly across demographic variables (all p>0.05). Qualitative themes identified four key drivers: comfort and physical experience, visual experience, ease of use and interaction, and psychological and motivational factors. Portability and community suitability were valued. Conclusion Rural underserved patients strongly preferred portable visual field devices, particularly VR-based, over table-mounted HFA. Comfort, ergonomic flexibility, immersive visual experience, and simplicity of interaction were central determinants of preference. Portable perimetry may enhance patient-centered glaucoma monitoring within telemedicine programs and access in resource-limited settings.

7

On the location of a "central retina" in mice

Günter, A.; Mühlfriedel, R.; Seeliger, M. W.

2026-04-21 neuroscience 10.64898/2026.04.16.718979 medRxiv

Top 0.3%

0.7%

Show abstract

The retinal topography of mammals reflects significant influences of the visual environment. In diurnal species, local specializations, such as the visual streak (VS) for panoramic vision and the area centralis or fovea for binocular vision, play a key role in optimizing visual perception and species viability. While the location of these sites is typically considered the retinal center, the definition of a "central retina" is less clear in nocturnal species. In mice, the most frequently used model in ophthalmologic research, the location of a central retina is hardly discernible in retinal images, neither in retinal structure (OCT sections) nor in vascular organization (SLO and angiography). In this study, we compare the murine retina with that of a diurnal rodent, the Mongolian gerbil (MG). We found that the S-opsin transitional zone (OTZ), a region characterized by the change from S-to M-opsin dominance along the dorsoventral opsin gradient in mice, has a similar relative position in the retina to the VS in the Mongolian gerbil, suggesting an evolutionary positional homology between these regions. Further, since the S-opsin-dominant region is optimized for visualizing the sky and the M-opsin-dominant region for visualizing the ground, the OTZ in between -much like the VS- naturally points toward the horizon. We therefore propose considering the OTZ as the position of a "central retinal area" in mice. Determining the anatomical-physiological center is particularly important to obtain meaningful relative measures such as averages across different retinal areas, as the common referencing to the optic nerve head (ONH) in mice does not take into account retinal organization and the eccentric position of the functional center.

8

Biventricular cardiac dynamic shape: genetics and cardiometabolic disease associations

Burns, R.; Young, W. J.; Uddin, K.; Petersen, S. E.; Ramirez, J.; Young, A. A.; Munroe, P. B.

2026-04-20 genetic and genomic medicine 10.64898/2026.04.19.26350940 medRxiv

Top 0.6%

0.1%

Show abstract

BackgroundGenetic studies using cardiac magnetic resonance (CMR) imaging have identified loci related to cardiac shape, but most focus on static morphology. The value of a dynamic cardiac shape atlas capturing both shape and function remains unknown. MethodsA dynamic shape atlas comprising CMR-derived shape models at end-diastole and end-systole was combined with genetic and outcome data in 36,992 UK Biobank participants. Dynamic shape principal components (PCs) describing >1% of variance were characterized, and tested for associations with prevalent and incident cardiometabolic diseases, including ischemic heart disease (IHD), heart failure (HF), significant atrioventricular block (AVB), and atrial fibrillation (AF), and independent predictive power alongside standard CMR measures. Genome-wide association studies (GWAS) were performed to identify candidate genes and biological pathways, and polygenic risk scores (PRS) were assessed for disease associations. Mendelian randomization (MR) was performed to test causality of observed disease associations. ResultsWe identified 14 dynamic cardiac shape PCs capturing 83.3% of total dynamic cardiac shape variance. These PCs captured distinct functional remodeling patterns such as variation in annular plane systolic excursion, while remaining only modestly correlated with standard CMR measures. All 14 PCs were associated with at least one incident cardiometabolic disease, with the strongest associations observed for incident IHD, HF, and AVB. Notably, incorporating dynamic shape PCs improved the prediction of incident IHD beyond standard CMR measures. GWAS identified 75 genetic loci associated with dynamic shape, including 14 variants previously unreported for cardiac traits, and candidate genes demonstrated enrichment in pathways related to cardiac development and contractile function. PRS derived from dynamic shape loci were significantly associated with multiple outcomes, most prominently HF. MR identified significant causal relationships between several PCs and cardiometabolic disease. ConclusionsDynamic cardiac shape features capture aspects of cardiac structure and function not fully represented by standard CMR measures. These features are strongly associated with incident cardiometabolic disease and provide new insights into the genetic architecture of cardiac remodeling. Clinical perspectiveO_ST_ABSWhat is new?C_ST_ABSO_LIGenetic and outcome relationships with a dynamic statistical shape model capturing both left and right ventricles at end-diastole and end-systole. C_LIO_LIDemonstration of incremental value over existing cardiac shape models, through capture of functional remodeling not represented by standard imaging measures. C_LIO_LIIdentification of genetic susceptibility loci for dynamic cardiac shape, including 14 variants not previously reported for cardiac traits. C_LI What are the clinical implications?O_LIThe results enhance our understanding of the genetic architecture of dynamic cardiac shape and function in the general population and clarify their relationships with other cardiovascular endophenotypes and incident cardiometabolic diseases. C_LIO_LINewly identified candidate genes expand the biological pathways implicated in cardiac remodeling and provide targets for future functional and mechanistic studies. C_LIO_LIThe improved prediction of incident cardiometabolic disease, particularly ischemic heart disease, achieved by adding dynamic shape PCs to traditional CMR measures suggests potential value for their inclusion in evaluation of patients. C_LI

9

Comprehensive Exome Sequencing in Swedish Patients with Spontaneous Coronary Artery Dissection

Gunnarsson, C.; Ellegard, R.; Ahsberg, J.; huda, s.; Andersson, J.; Dworeck, C. F.; Glaser, N.; Erlinge, D.; Loghman, H.; Johnston, N.; Mannila, M.; Pagonis, C.; Ravn-Fischer, A.; Rydberg, E.; Welen Schef, K.; Tornvall, P.; Sederholm Lawesson, S.; Swahn, E. E.

2026-04-24 genetic and genomic medicine 10.64898/2026.04.22.26351535 medRxiv

Top 0.6%

0.1%

Show abstract

Abstract Background Spontaneous coronary artery dissection (SCAD) is a well-recognised cause of acute coronary syndrome particularly among women without conventional cardiovascular risk factors. Increasing evidence indicates a genetic contribution; however, the underlying genetic architecture of SCAD remains insufficiently understood. Objective The aim of this study was to assess the prevalence of rare variants in previously reported SCAD associated genes and to explore the potential presence of novel genetic alterations in well-characterised Swedish patients with SCAD. Methods The study comprised 201 patients enrolled in SweSCAD, a national project examining the clinical characteristics, aetiology, and outcomes of SCAD. All individuals had a confirmed diagnosis based on invasive coronary angiography. Comprehensive exome sequencing was performed to identify rare variants contributing to disease susceptibility. Results Genetic variants that have been associated with SCAD according to current clinical genetics practice for variant reporting were identified in approximately 4 % of patients. In addition, rare potentially relevant variants were detected in almost 60 % of patients in genes associated with vascular integrity and vascular remodelling. Conclusion This study supports SCAD as a genetically complex arteriopathy, driven by rare high?impact variants together with broader polygenic susceptibility. Variants in collagen, vascular extracellular matrix, and oestrogen?responsive pathways provide biologically plausible links to female?predominant disease. Although the diagnostic yield of clearly actionable variants is modest, these findings support broader genomic evaluation beyond overt syndromic presentations and highlight the need for larger integrative genomic and functional studies to refine risk stratification and management.

10

Beyond Histology: A Validated CUBIC-Based Workflow for Volumetric Analysis of Follicles and Cortical Vasculature in Human Ovarian Tissue

Pavlidis, D. I.; Fischer, C. E.; Jennings, M. A.; Machlin, J. H.; Jan, V.; Baker, B. M.; Shikanov, A.

2026-04-21 bioengineering 10.64898/2026.04.16.718954 medRxiv

Top 0.6%

0.1%

Show abstract

Research questionCan tissue clearing, combined with volumetric imaging, enable reliable, quantitative three-dimensional analysis of follicles and vasculature in intact human ovarian tissue? DesignA CUBIC-based clearing protocol was adapted for human ovarian medulla and cryopreserved cortex. Tissue from reproductive-aged donors was cleared, fluorescently labeled, and imaged using confocal and light sheet microscopy. Tissue expansion, imaging depth, and vascular morphometrics were quantified and follicle density was compared to conventional histology. ResultsClearing produced optically transparent tissue with a linear expansion factor of 1.2 across cortex and medulla. Imaging depth increased 6.5-11-fold in cortex and 6-8-fold in medulla. Follicle density measurements in immunolabeled cleared cortex were comparable to histology, supporting the validity of volumetric follicle quantification. Light sheet microscopy of lectin-labeled cortex revealed no significant donor-to-donor differences in vascular morphometrics, including mean vessel diameters of 12-14 {micro}m, branch point densities of 632-965 points/mm3, vessel length densities of 117-175 mm/mm3, and volume fractions of 1.9-2.3%. Volumetric imaging further illustrated heterogeneous spatial relationships between follicles and surrounding vessels. ConclusionTissue clearing and volumetric imaging complement routine histology and enable quantitative three-dimensional investigation of follicle-vascular interactions in intact human ovarian tissue, providing a framework for advancing fertility preservation and ovarian tissue transplantation research.

11

TomoSwin3D: a Swin3D Transformer for the Identification and Classification of Macromolecules in 3D Cryo-ET Tomograms

Dhakal, A.; Gyawali, R.; Cheng, J.

2026-04-21 biochemistry 10.64898/2026.04.17.719219 medRxiv

Top 0.7%

0.1%

Show abstract

Cryo-electron tomography (cryo-ET) enables in situ three-dimensional visualization of many protein complexes and other macromolecular assemblies such as ribosomes in cells, yet automated macromolecule particle identification in 3D cryo-ET tomograms remains a major bottleneck due to dose-limited low signal-to-noise ratios, missing-wedge artifacts, and densely crowded cellular backgrounds. We present TomoSwin3D, an end-to-end three-dimensional (3D) macromolecule particle identification and classification pipeline centered on a Swin Transformer-based U-Net that performs particle identification and classification and outputs particle centroid coordinates. TomoSwin3D leverages a multi-channel input representation that augments raw tomogram densities with complementary 3D feature maps capturing edge strength (Sobel gradients), local contrast enhancement (morphological top-hat), and multiscale blob responses (Difference-of-Gaussians), improving detectability of small and low-contrast targets. To better preserve particle geometry and avoid hand-crafted shape assumptions, it adopts occupancy-preserving supervision that directly uses available 3D instance masks rather than heuristic Gaussian/spherical labels and applies scalable patch-wise inference followed by lightweight post-processing (connected-component analysis, size filtering, centroid extraction) for robust centroid coordinate extraction. Across diverse simulated and experimental cryo-ET tomogram benchmarks including SHREC 2021 and 2020 test datasets, EMPIAR dataset, and Cryo-ET data portal dataset, TomoSwin3D achieves strong and consistent performance in detecting proteins and other particles, outperforming existing methods, with a pronounced advantage in picking hard, small protein particles. These results establish TomoSwin3D as a scalable and accurate solution for high-throughput cryo-ET macromolecule particle picking and downstream subtomogram averaging.

12

Transcriptomic analysis of organotypic porcine retina cultures

khosravi, s.; Giorgio, G.; Staurenghi, F.; schoenberger, t.; Gross, P.; Ried, M.; Frankenhauser, J.; Eder, S.; Markert, E.; Bakker, R.; Babaei, S.; Zippel, N.

2026-04-21 molecular biology 10.64898/2026.04.16.718959 medRxiv

Top 0.7%

0.1%

Show abstract

Porcine organotypic retinal explant cultures are widely used to study retinal neurodegeneration under controlled conditions, but the biological process that occurs in the retinal explant over time due to preparation-induced injury and culture are not well understood. Here, we generated a time-resolved transcriptomic reference for porcine neural retinal explants-maintained ex vivo for 10 days. Global expression profiles are strongly separated by culture time, with Day 0 clearly distinct from cultured samples and at Day 7 and Day 10 showing the highest similarity, indicating a transition toward a later stabilized state. Across the time course, 3,187 genes were differentially expressed relative to Day 0, with the largest shifts occurring at an early stage of culture (Day 1-Day 3). Pathway-level analyses revealed coordinated remodeling involving inflammatory signaling, and metabolic/bioenergetic changes, including reduced mitochondrial and oxidative phosphorylation-related programs at later time points. Here, we provide a time-resolved transcriptomics reference dataset for cultured porcine retinal explants. These data can build a foundation to interpret data generated in this model, differentiate changes inherent to the explant culture from treatment-specific effects and to select appropriate experimental windows for mechanistic studies of retinal degeneration.

13

Vision Language Model for Coronary Angiogram Analysis and Report Generation: Development and Evaluation Study

Jiang, Q.; Ke, Y.; Sinisterra, L. G.; Elangovan, K.; Li, Z.; Yeo, K. K.; Jonathan, Y.; Ting, D. S. W.

2026-04-21 cardiovascular medicine 10.64898/2026.04.19.26351241 medRxiv

Top 0.7%

0.1%

Show abstract

Coronary artery disease is a leading cause of morbidity and mortality. Invasive coronary angiography is currently the gold standard in disease diagnosis. Several studies have attempted to use artificial intelligence (AI) to automate their interpretations with varying levels of success. However, most existing studies cannot generate detailed angiographic reports beyond simple classification or segmentation. This study aims to fine-tune and evaluate the performance of a Vision-Language Model (VLM) in coronary angiogram interpretation and report generation. Using twenty-thousand angiogram keyframes of 1987 patients collated across four unique datasets, we finetuned InternVL2-4B model with Low-Rank Adaptor weights that can perform stenosis detection, anatomy labelling, and report generation. The fine-tuned VLM achieved a precision of 0.56, recall of 0.64, and F1-score of 0.60 for stenosis detection. In anatomy segmentation, it attained a weighted precision of 0.50, recall of 0.43, and F1-score of 0.46, with higher scores in major vessel segments. Report generation integrating multiple angiographic projection views yielded an accuracy of 0.42, negative predictive value of 0.58 and specificity of 0.52. This study demonstrates the potential of using VLM to streamline angiogram interpretation to rapidly provide actionable information to guide management, support care in resource-limited settings, and audit the appropriateness of coronary interventions. AUTHOR SUMMARYCoronary artery disease has heavy disease burden worldwide and coronary angiogram is the gold standard imaging for its diagnosis. Interpreting these complex images and producing clinical reports require significant expertise and time. In this study, we fine-tuned and investigated an open-source VLM, InternVL2-4B, to interpret and report coronary angiogram images in key tasks including stenosis detection, anatomy identification, as well as full report generation. We also referenced the fine-tuned InternVL2-4B against state-of-the-art segmentation model, YOLOv8x, which was evaluated on the same test sets. We examined how machine learning metrics like the intersection over union score may not fully capture the clinical accuracy of model predictions and discussed the limitations of relying solely on these metrics for evaluating clinical AI systems. Although the model has not yet achieved expert-level interpretation, our results demonstrate the potential and feasibility of automating the reporting of coronary angiograms. Such systems could potentially assist cardiologists by improving reporting efficiency, highlightning lesions that may require review, and enabling automated calculations of clinical scores such as the SYNTAX score.

14

SIMO - Single Section Integrative Multi-Omics - spatial mapping of metabolites and lipids combined with region-specific proteomics in a single tissue slice

Hau, K.; Fecke, A.; Hormann, F.-L.; Groba, A.-C.; Melo, L. M. N.; Cansiz, F.; Allies, G.; Hentschel, A.; Chen, J.; Heiles, S.; Tasdogan, A.; Sickmann, A.; Smith, K. W.

2026-04-21 biochemistry 10.64898/2026.04.17.719206 medRxiv

Top 0.9%

0.0%

Show abstract

Technological advances in biomedical sciences have accelerated multi-omics research, enabling high-resolution spatial mapping of diverse molecular compound classes. However, integrating spatial omics often requires serial tissue sections, limiting the alignment correlation across modalities. We present a single-section integrative multi-omics (SIMO) workflow that combines metabolite and lipid imaging with histopathology and region-specific proteomics. Using MALDI-MSI, tissue staining, and laser microdissection (LMD), SIMO delivers comprehensive metabolic, lipidomic, and proteomic insight from the same sample. Using mouse cardiac tissue we develop, control, and validate the methodology resulting in [~]60 imaged lipids and [~]60 imaged metabolites at 20 {micro}m pixel size and subsequently spatial proteomics by LMD, detecting over 5,000 proteins from the same tissue. To demonstrate the capabilities of the workflow in preclinical context, we apply SIMO to a metastasizing melanoma PDX model, identifying over 100 spatially localized lipids and metabolites, and over 5,000 proteins across metastases and non-tumor tissues in liver. SIMO enables precise ROI selection, statistical comparison of protein regulation, and alignment of metabolic and lipidomics pathways across spatial omics and region-specific proteomics, demonstrating its value as a spatial multi-omics platform.

15

Ensemble Approaches to Screening, Diagnosis, and Subtyping of Multiple Sclerosis

Yang, I. Y.; Patil, A.; Jin, O.; Loud, S.; Buxhoeveden, S.; Zhang, D. Y.

2026-04-21 genetic and genomic medicine 10.64898/2026.04.19.26351230 medRxiv

Top 1.0%

0.0%

Show abstract

Multiple sclerosis (MS) is a debilitating disease affecting more than 1 million Americans, and today is assessed primarily through magnetic resonance imaging (MRI) and observational clinical symptoms. Given the autoimmune nature of MS, we hypothesized that high-dimensional gene expression data from peripheral blood mononuclear cells (PBMCs), when analyzed with the assistance of AI, may collectively serve as valuable biomarkers for the real-time risk and progression of MS. Here, we present PBMC RNA sequencing (RNAseq) results from N=997 samples, including 540 MS, 221 neuromyelitis optica (NMO), and 149 healthy controls. We constructed and optimized ensemble models for three clinical outcomes: (1) discrimination of early MS (EDSS [≤] 2.0) from healthy individuals with 74% AUC at 100% coverage, (2) differential diagnosis of MS from NMO with 91% AUC at 80% coverage, and (3) subtyping RRMS from progressive MS with 79% AUC at 80% coverage. To our knowledge, no prior molecular test has been reported for any of these three MS clinical tasks, and these results may have immediate impact on clinical management of MS patients. Two innovations that improved the stratification accuracy of our models: selection of gene sets based on expression variance in disease states, and use of non-linear rank sort and conviction weighting in the ensemble score calculation.

16

A Multi-Omics Computational Pipeline for Systematic Discovery of Retired Self-Antigens as Cancer Vaccine Targets

Wang, V.; Deng, S.; Aguilar, R.

2026-04-22 genetic and genomic medicine 10.64898/2026.04.20.26351288 medRxiv

Top 1.0%

0.0%

Show abstract

BackgroundThe retired antigen hypothesis, introduced by Tuohy and colleagues, proposes that tissue-specific proteins expressed conditionally during early life or reproductive stages, then silenced in normal aging tissue, represent safe and effective cancer vaccine targets when re-expressed in tumors. To date, discovery of retired antigens has relied entirely on hypothesis-driven wet lab work, limiting throughput. MethodsHere we present RADAR (Retired Antigen Discovery and Ranking), a multi-omics computational pipeline implemented on a standard server that systematically identifies retired antigen candidates. RADAR comprises four core discovery layers integrating: 1) The Genotype-Tissue Expression Portal (GTEx) normal tissue expression, 2) TCGA tumor re-expression, 3) DNA methylation, and 4) miRNA regulatory networks, each applied sequentially to identify genes exhibiting the epigenetic and post-transcriptional hallmarks of tissue-specific retirement followed by tumor re-activation. Candidate characterization is further supported by three automated modules: 1) protein-level safety screening via the Human Protein Atlas, 2) molecular subtype enrichment analysis, and 3) cross-cancer confirmation, which execute automatically when the relevant data are available for the selected cancer type. ResultsThe pipeline independently validated known targets including alpha-lactalbumin (LALBA, the basis of the Tuohy Phase 1 triple-negative breast cancer vaccine trial) and anti-Mullerian hormone (AMH), consistent with Tuohys ovarian cancer vaccine program targeting AMHR2, and rediscovered multiple known cancer-testis antigens (MAGEA1, MAGEC1, SSX1) as positive controls. Among 4,664 initial candidates derived from GTEx, the pipeline identified 20 high-confidence retired antigen candidates passing all filters. DCAF4L2, COX7B2, TEX19, and CT83 emerge as the highest-priority novel candidates for experimental validation, demonstrating zero expression in critical somatic organs, strong epigenetic silencing, and significant re-expression across multiple cancer types. ConclusionRADAR provides the first systematic computational framework for retired antigen discovery, offering a reproducible and scalable approach to expanding the cancer immunoprevention pipeline beyond individually characterized targets. The pipeline is fully reproducible, requires no specialized hardware, and is immediately extensible to additional TCGA cancer types.

17

MedSAM2-CXR: A Box-Latent Framework for Chest X-ray Classification and Report Generation

Hakata, Y.; Oikawa, M.; Fujisawa, S.

2026-04-22 health informatics 10.64898/2026.04.20.26351338 medRxiv

Top 1.0%

0.0%

Show abstract

Who is affectedIn Japan, approximately 100 million chest radiographs (CXRs) are acquired annually, while only about 7,000 board-certified diagnostic radiologists practice nationwide (Japan Radiological Society workforce statistics; OECD Health Statistics, most recent available year). This implies an average workload exceeding 10,000 imaging studies per radiologist per year if all CXRs were attributed to board-certified diagnostic radiologists (an upper-bound estimate, because in practice many CXRs are primarily read by non-radiologist physicians). In settings such as night shifts, weekends, remote islands, and regional care networks, non-radiologist physicians frequently act as primary readers. Despite strong demand for AI assistance, existing systems are typically limited by one of three shortcomings -- poor cross-institutional generalization, limited interpretability, or inability to generate draft reports -- and consequently see limited clinical deployment. What we builtWe propose a Box-Latent Trinity that embeds each image as a hyperrectangle parameterized by a center c and a radius r, rather than as a single point in a latent space. We further introduce BL-TTA (Box-Latent Test-Time Augmentation), which approximately closes the train-inference gap (exact in the N [->] {infty} limit; N = 8 suffices in practice) by averaging predictions over samples drawn from within the latent box at inference time. Both components are implemented on top of the frozen MedSAM2 medical imaging foundation model. A single box representation simultaneously supports three functions: (A) theoretically grounded source selection, (B) device-invariant augmentation, and (C) case-based retrieval-augmented generation (RAG). Each prediction is accompanied by retrieved similar prior cases, a calibrated confidence estimate, and clinical-guideline references. How well it performsOn the Open-i CXR corpus (2,954 image-report pairs) under a patient-level 80/10/10 split and 5-seed reproducibility, the full system B5 achieves macro area under the receiver-operating-characteristic curve (macro-AUROC) 0.639 (best-seed test; 5-seed mean 0.626, Table 2; absolute +0.015 over the strongest same-backbone baseline, Merlin-style 0.624), elementwise accuracy 0.753 (absolute +0.072 over Merlin-style 0.681 -- equivalent to approximately 7 fewer label-level errors per 100 (label, image) predictions across 14 finding labels, not per 100 images), and report label-F1 0.435 (absolute +0.086, relative +25 % over the strongest same-backbone report-generation baseline, Bootstrapping-style 0.349). Under simulated pixel-space device-shift intensities up to twice the training distribution, AUROC degrades by only 0.014. Brier score (macro) is 0.061; Cohens{kappa} between two independent rule-based label extractors is 0.702 (substantial agreement); the box radius yields an out-of-distribution (OOD) detection AUROC of 0.595; and the framework provides four structural explainable-AI (XAI) outputs -- retrieved similar cases, confidence tier, per-axis uncertainty, and visual saliency -- which we jointly quantify in a single CXR study, a combination that, to our knowledge, has not been reported previously. O_TBL View this table: org.highwire.dtl.DTLVardef@d8ced6org.highwire.dtl.DTLVardef@1f3471dorg.highwire.dtl.DTLVardef@c1c2f1org.highwire.dtl.DTLVardef@e589bdorg.highwire.dtl.DTLVardef@1b5e410_HPS_FORMAT_FIGEXP M_TBL C_TBL Path to deploymentBecause the complete experiment can be reproduced in under two hours on a consumer-grade GPU (NVIDIA RTX 4060, 8 GB VRAM), the framework can run on compute resources already available at typical healthcare institutions. The approach thus supports the practical delivery of evidence-grounded diagnostic support to night shifts, remote-island care, and secondary readings in health checkups -- settings in which a board-certified radiologist is not locally available. One-sentence summaryReproducible end-to-end in under two hours on a single consumer-grade GPU, the proposed framework outperforms the strongest same-backbone medical-AI baselines on three principal metrics, maintains accuracy under simulated device shifts, and automatically drafts evidence-grounded radiology reports, offering a reproducible and compute-efficient direction toward reducing the reading burden of Japanese radiologists, subject to external validation.

18

Practical quantification of immunohistochemistry antigen concentrations and reaction-diffusion parameters

Peale, F. V.; Perng, W.; Mbiribindi, B.; Andrews, B. T.; Wang, X.; Dunlap, D.; Eastham, J.; Ngu, H.; Chernyshev, A.; Orlova, D.

2026-04-21 pathology 10.64898/2026.04.16.719078 medRxiv

Top 1%

0.0%

Show abstract

The immunohistochemistry (IHC) methods widely used in diagnostic medicine and biomedical research are kinetically complex reaction-diffusion processes that, ideally, produce stain intensities correlated with the local antigen concentration. Yet after 75 years of use, practical theoretical tools to rigorously plan and interpret IHC experiments are still lacking. Because modeling the reactions requires time-consuming computer simulation, impractical for regular use, most protocols are optimized empirically, without detailed knowledge of the reaction rates and antigen-antibody equilibria. The resulting stain intensities can be calibrated against standards with known antigen abundance, but they are typically not interpretable in terms of chemical antigen concentrations. To address these limitations, we developed a fast interpolation method to model reaction-diffusion behavior, and experimental methods to characterize IHC kinetic parameters in formalin-fixed paraffin-embedded (FFPE) samples. Used together, these allow experimental measurement of both the chemical concentration of antigen in the sample and the reaction-diffusion parameters consistent with the assay results. Results show 1) direct immunofluorescent detection has low nanomolar sensitivity with >1000-fold dynamic range, and 2) antibody diffusion rates in FFPE samples can be >1000-fold slower than in aqueous solutions, producing diffusion-limited conditions in which the IHC reaction time course may depend on the sample antigen concentration. Awareness of these details is necessary to avoid potential underestimation of both the absolute and relative antigen concentrations in different samples that may occur if staining is stopped before reaching equilibrium. Software tools are provided to allow users to rapidly model IHC reaction time courses and to fit experimental time course data with candidate reaction parameters. The principles described here apply equally to other tissue-based "spatial omics" analyses and should be considered when designing and interpreting experiments requiring any macromolecule to diffuse into and react in a tissue section. SIGNIFICANCEThe theoretical and experimental framework described here advances IHC staining from a qualitative or semi-quantitative method towards a more rigorously quantitative assay. The practical ability to predict IHC reaction kinetics and fit reaction parameters to experimental data has the potential to advance IHC applications in diagnostic medicine and biomedical research in three ways: 1) interpretation of experimental and diagnostic samples stained under different conditions can be more objective, facilitating comparison of results from different protocols and different laboratories; 2) IHC staining can be interpreted as molar chemical antigen-antibody concentrations calculated from the reaction parameters measured in the studied sample; 3) the correlation between antigen concentration and biological behavior can be examined more reliably. Practical software tools are provided.

19

Consensus Through Diversity: A Comprehensive Benchmark of Multi-Omic Approaches for Precision Breast Oncology

Sionakidis, A.; Pinilla Alba, K.; Abraham, J.; Simidjievski, N.

2026-04-21 bioinformatics 10.64898/2026.04.17.719159 medRxiv

Top 1%

0.0%

Show abstract

Emerging multi-omic profiling has made it feasible to subtype disease using multiple molecular layers. However, inconsistent preprocessing, heterogeneous implementations, variable evaluation, and limited reproducibility often constrain method selection. Here, we systematically benchmark 22 publicly available unsupervised approaches for bulk data on the TCGA-BRCA cohort across five modalities (RNA-seq, miRNA, DNA methylation, copy numbers, single nucleotide polymorphisms) and validate findings in two independent datasets, enabling a multi-layered comparison of performance, heterogeneous data support and interpretability. Most approaches fuse multi-omic data to produce a two-cluster solution largely aligned with ER status, with higher-resolution approaches further refining these into four coherent subclasses (angiogenic luminal, oxidative-phosphorylation/HER2-low luminal, immune-inflamed basal-like, and hyper-proliferative basal-like). Our benchmarking results indicate that methods based on similarity networks can efficiently produce stable, reliable partitions. Matrix factorisation and Bayesian factorisation algorithms produce rich latent representations, allowing quantification of feature and modality contributions, albeit at higher computational cost. Consensus clustering can be used on a case-by-case basis and refine partitions into more robust and generalisable findings. We aggregate our insights into a decision workflow that aligns with study goals, data characteristics, and computational resources, enabling optimal analytic strategies. This comprehensive assessment provides a practical roadmap for investigators seeking to extract reproducible, biologically meaningful subtypes from complex multi-omic datasets. We higlight the different technical and practical benefits and trade-offs that shape the selection and development of multi-omic approaches applied in precision oncology.

20

Cross-ancestry evaluation of idiopathic pulmonary fibrosis genetic risk variants

Nabunje, R.; Guillen-Guio, B.; Hernandez-Beeftink, T.; Joof, E.; Leavy, O. C.; International IPF Genetics Consortium, ; Maher, T. M.; Molyneux, P.; Noth, I.; Urrutia, A.; Aburto, M.; Flores, C.; Jenkins, R. G.; Wain, L. V.; Allen, R. J.

2026-04-25 genetic and genomic medicine 10.64898/2026.04.17.26349970 medRxiv

Top 1%

0.0%

Show abstract

Genome-wide association studies of idiopathic pulmonary fibrosis (IPF) have identified 35 common genetic risk loci associated with IPF susceptibility. In this study, we evaluated the effects of the reported variants in clinically curated non-European individuals. Despite limited sample sizes, we observed partial replication, limited transferability of some variants and evidence of ancestry-specific effects. The MUC5B promoter variant rs35705950 emerged as the dominant and most consistent signal across ancestries. Our findings highlight the need for larger, well-characterised studies in understudied populations to support robust discovery and translation.